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The  performance  of  sequential  search  can  be  enhanced  by  the  use  of  heuristics  that  move  elements  closer  to 
the  front  of  the  list  as  they  arc  found.  Previous  analyses  have  characterized  the  performance  of  such  heuristics 
probabilistically.  In  this  paper  we  show  that  the  heuristics  can  also  be  analyzed  in  the  worst-case  sense,  and 
that  the  relative  merit  of  the  heuristics  under  this  analysis  is  different  than  in  the  probabilistic  analyses. 


Simulations  show  that  the  relative  merit  of  the  heuristics  on  real  data  is  closer  to  that  of  the  new  worst-case 
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1.  Introduction 

The  performance  of  sequential  search  in  an  unsorted  list  can  be  enhanced  by  die  use  of  self-organizing 
heuristics  that  attempt  to  ensure  that  frequently  accessed  keys  are  near  the  front  of  the  list3  The  following 
three  heuristics  are  representative  of  a  larger  class. 

•  Transpose.  When  the  key  is  found,  move  it  one  closer  to  the  front  of  the  list  by  transposing  it  with 
the  key  immediately  in  front  of  it 

•  Move-to-Front.  When  the  key  is  found,  move  it  to  the  front  of  the  list  (all  other  keys  retain  their 
relative  order). 

•  Count  When  the  key  is  found,  increment  its  count  field  (an  integer  that  is  initially  zero)  and  move 
it  forward  as  little  as  needed  to  keep  the  list  sorted  in  decreasing  order  by  count 

Note  that  the  first  two  heuristics  require  no  memory  other  than  that  for  representing  the  lists,  while  the  third 
heuristic  requires  an  additional  count  field;  the  first  two  heuristics  will  therefore  be  called  memoryless. 
Previous  work  (described  in  the  next  section)  has  shown  that  under  various  probabilistic  assumptions,  these 
heuristics  can  significantly  reduce  the  time  required  by  sequential  search. 

In  this  paper  wc  will  investigate  the  heuristics  from  a  novel  viewpoint:  that  of  their  worst-case  performance 
rather  than  their  expected  performance.  Such  an  analysis  is  trivial  and  uninteresting  if  we  consider  die 
worst-case  cost  of  a  single  search.  We  will  therefore  count  the  worst-case  number  of  comparisons  made  by  the 
heuristics  for  any  particular  sequence  of  search  keys,  and  show  that  for  the  Move-to-Front  and  Count 
heuristics,  that  number  is  at  most  twice  the  number  of  comparisons  made  when  using  the  Optimal  Static 
Ordering  (defined  in  die  next  section)  for  the  sequence  of  requests.  This  result  immediately  implies  the 
strongest  general  theorem  known  for  the  expected  time  of  die  Move-to-Front  heuristic.  We  also  give  a 
counterexample  that  shows  that  the  Transpose  heuristic  has  very  poor  worst-case  performance. 

These  analyses  are  of  both  theoretical  and  practical  interest.  This  paper  emphasizes  die  theoretical  tools 
used  in  the  worst-case  analyses.  Our  results  provide  a  simpler  proof  of  a  stronger  theorem  regarding  an 
artifact  that  has  been  extensively  studied  for  almost  two  decades.  Furthermore,  the  analyses  use  a  simple  but 
elegant  bookkeeping  technique  of  general  interest  The  practical  contribution  of  this  paper  is  not  so  much 
prescriptive  as  descriptive:  practitioners  have  long  used  the  Move-to-Front  heuristic  even  though  theoretical 
analyses  indicated  that  the  Transpose  heuristic  was  superior.  Our  analysis  provides  a  metric  under  which 
Mo»«-To-Front  is  superior  to  Transpose  and  thereby  explains  die  actions  of  die  practitioners.' 


*8*  the  wrm  fct  m  rsfcr  aaty  »  ks  wpiMtial  nature:  our  results  qipiy  to  ttsu  implemented  with  arrays  or  with  records  and  patun. 
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This  paper  is  organized  in  six  sections.  An  overview  of  previous  work  can  be  found  in  Section  2.  With  that 
background,  the  new  results  are  presented  in  Section  3.  Section  4  describes  the  results  of  empirical  studies, 
and  advice  to  practitioners  is  offered  in  Section  5.  The  paper  is  summarized  in  Section  6. 

2.  Previous  Work 

In  this  section  we  will  survey  previous  results  concerning  self-organizing  heuristics.  Only  the  more 
important  results  are  are  presented  here;  for  further  study,  consult  the  references  at  the  end  of  the  paper. 

The  heuristics  (or  rules)  that  we  study  deal  with  searches  for  elements  of  a  set  of  N  keys  stored  m  a  list  A 
particular  query  is  answered  by  performing  a  sequential  search  for  the  requested  key  and  then  reordering  the 
list  according  to  some  search  rule.  A  string  of  requests  forms  a  request  sequence.  An  important  kind  of  request 
sequence  independently  chooses  the  ith  element  with  probability  pt  according  to  the  probability  distribution  P 
=(j\,Pi, . . .  ,pN).  The  cost  of  a  search  rule  for  such  a  distribution  has  been  defined  as  the  asymptotic  expected 
search  cost  for  a  single  key  (measured  as  the  number  of  comparisons  made)  when  the  set  is  being  reordered 
according  to  the  rule;  we  will  denote  this  cost  by  Ag(P)  for  rule  R.  The  Optimal  Static  Ordering  for  the  set  is 
one  in  which  the  keys  are  arranged  in  decreasing  order  of  request  probabilities  and  never  reordered.  While 
this  is  not  necessarily  optimal  over  all  rules  (because  it  is  static,  rather  than  dynamic),  it  is  used  as  a  basis  for 
comparing  the  performance  of  the  heuristics,  and  its  cost  will  be  denoted  by  A0(P).  The  heuristics  have  been 
studied  under  this  asymptotic  model  since  1965;  we  present  the  significant  results  below.  The  Count  heuristic 
is  considered  after  the  two  memory  less  heuristics. 

The  asymptotic  expected  search  cost  Ay(P)  under  the  Move-to-Front  rule  for  the  probability  distribution 
P  has  been  given  by  McCabe  [1965],  Burville  and  Kingman  [1973],  Knuth  [1973],  Hendricks  [1976],  Rivcst 
[1976],  and  Bitner  [1979].  It  is  known  that  for  any  distribution  P,  the  cost  Ay(P)  is  at  most  twice  foe  cost  of 
foe  Optimal  Static  Ordering,  Aq(P).  The  asymptotic  expected  cost  for  Transpose,  Aj(P),  was  shown  by 
Rivest  [1976]  to  be  less  than  or  equal  to  A^(P);  this  bound  is  strict  for  all  P  except  where  all  foe  nonzero 
probabilities  are  equal  or  when  N=  2. 

Rivcst  defined  foe  optimal  permutation  rule  to  be  one  with  least  cost  for  all  ?and  any  initial  ordering  of  the 
keys,  and  conjectured  that  Transpose  is  optimal  Yao  (reported  in  Bitner  [1976])  and  Bitner  [1982]  have  given 
distributions  where  Transpose  is  optimal  over  all  rules,  but  Anderson,  Nash,  and  Weber  [1982]  presented  a 
counterexample  to  foe  conjecture  by  finding  a  rule  that  is  better  than  Transpose  for  a  specific  distribution. 
Recent  work  has  examined  modifications  of  foe  heuristics  and  asymptotic  costs  for  classes  of  probability 
distributions:  see  Gonnet,  Munro  and  Suwanda  [1979],  Hendricks  [1973],  Bitner  [1976, 1982],  Kan  and  Ross 
[1981],  and  Tenenbaum  and  Nemes  [1982).  Zipfs  Law  is  a  natural  distribution;  Knuth  [1973]  showed  that  the 
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cost  of  Move-to-Front  under  this  distribution  is  bounded  by  2  In  2  (about  1.386)  times  the  cost  of  the  Optimal 
Static  Ordering.  The  highest  ratio  yet  found  is  w/2  (by  Gonnet,  Munro,  and  Suwanda  [1979]),  and  the 
existence  of  a  bound  tighter  than  2-  A0(P)  remains  an  open  problem. 

Measurements  other  than  asymptotic  cost  have  been  considered.  Bitncr  [1976, 1979]  Stowed  that  while 
Transpose  is  asymptotically  more  efficient,  Move-to-Front  converges  more  quickly.  Therefore,  Move-to- 
Front  is  preferred  when  the  number  of  requests  is  not  large.  He  proposed  a  hybrid  rule,  which  changes  from 
Move-to-Front  to  Transpose  when  the  number  of  requests  falls  in  a  certain  range:  he  suggests  from  Q(N )  to 
©(N3 )  requests  as  the  change  point  for  Zipf  s  Law.  Bitner  also  discussed  the  overwork  (the  area  between  die 
cost  curve  and  its  asymptote)  for  the  two  rules,  and  presented  distributions  for  which  Move-to-Front  performs 
much  better  than  Transpose  under  diis  measure.  Under  Zipfs  Law,  for  instance,  the  overwork  is  O (N1)  for 
Move-to-Front  and  is  ft(iVJ )  for  Transpose. 

Rivest  [1976]  introduced  a  range  of  move-ahead-k  heuristics,  where  a  requested  key  is  moved  ahead  k 
positions  (k=  1  corresponds  to  Transpose  and  k=N—l  corresponds  to  Move-to-Front ),  and  simulated  the 
asymptotic  behavior  of  these  heuristics  for  values  of  k  from  1  to  6  and  values  of  N  from  3  to  12,  for  5000 
requests  distributed  by  Zipfs  Law.  On  the  basis  of  those  results,  Bitner  [1976]  conjectured  that  for  any  two 
heuristics  in  this  range,  one  will  converge  foster  and  the  other  will  have  lower  asymptotic  cost;  Gonnet, 
Munro,  and  Suwanda  [1979]  later  proved  this.  Tenenbaum  [1978]  performed  similar  tests  for  N  from  3  to  230 
and  for  12.000  requests,  with  k  from  1  to  7.  His  results  indicate  dial  for  larger  N  and  this  number  of  requests  a 
heuristic  other  than  Transpose  is  more  efficient. 

The  Count  heuristic  introduces  a  frequency  count  ft  of  requests  for  the  key.  Because  of  this  extra 
information.  Count  has  not  been  considered  to  be  in  the  same  class  as  die  first  two  heuristics  and  has  received 
less  attention.  By  die  law  of  large  numbers,  if  p,  >pj,  then  die  frequency  ft  may  be  less  than  fj  for  only  a  finite 
number  of  requests.  The  search  cost  under  Count  therefore  asymptotically  approaches  that  of  the  Optimal 
Static  Ordering.  Bitner  [1976]  showed  that  if  P  is  not  known  beforehand,  then  Count  is  at  all  times  optimal 

Various  modifications  of  Count  have  been  proposed  to  reduce  the  space  needed  to  maintain  the  frequency 
counts.  Bitner  [1976,  1979]  suggested  that  it  is  better  to  maintain  die  differences  between  frequencies  of 
adjacent  keys  rather  than  their  actual  counts  and  also  proposed  a  limited- difference  rule,  where  the  counts  are 
left  unchanged  after  some  upper  limit  is  reached.  Other  modifications  have  been  suggested  for  use  in 
combination  with  Move-to-Front  or  Transpose.  Gonnet,  Munro  and  Suwanda  [1979],  and  Kan  and  Ross 
[1980]  have  examined  row  heuristics,  where  a  key  is  moved  only  after  it  has  been  requested  k  times  in  a 
row,  and  Bitner  [1976]  has  analyzed  rules  of  the  form  wait-c-and-move,  Lam,  Sui,  and  Yu  [1981]  presented  a 
scheme  dutt  was  shown  to  be  optimal  over  all  heuristics  that  use  frequency  information. 
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3.  Worst-Case  Analyses 

The  primary  results  of  the  previous  section  can  be  summarized  as  follows:  for  any  probability  distribution 

P. 

Am(P)*2A0(P), 

AjiF)  £  AM(P),  and 
Ac(P)  —  Aq(P)- 

These  results  all  deal  with  the  asymptotic  expected  cost  of  a  single  search  when  the  queries  are  from  a 
distribution  P.  In  this  section  we  will  take  an  alternative  view  by  considering  the  worst-case  cost  of 
performing  all  searches  in  a  given  sequence  of  queries  S.  When  the  list  is  being  reordered  by  rule  R,  we  will 
denote  the  total  number  of  comparisons  required  for  the  sequence  S  by  CR(S)  (signifying  the  concrete  cost  as 
opposed  to  the  asymptotic  cost).  The  next  two  subsections  will  show  that  for  any  sequence  S, 

Cm(S)  S  2  Cq(S),  and 
Cc(S)  £  2  C0(S). 

In  Subsection  3.3  we  will  show  that  such  a  result  does  not  hold  for  the  Transpose  heuristic  by  exhibiting  a 
particular  sequence  with  very  poor  performance  under  that  rule. 

Before  describing  the  results  we  must  define  precisely  our  cost  functions.  For  the  Move-to-Front.  Count, 
and  Transpose  rules  we  define  CR(S)  (the  cost  of  rule  R  on  the  request  sequence  S)  by  considering  the  effect 
of  S  on  an  initially  empty  search  list.  For  each  element  /  of  S  we  in  turn  search  the  current  list  of  size  m  at  a 
cost  of  i  comparisons  if  /  is  in  position  i  or  m  comparisons  if  /  is  not  present  (in  which  case  we  then  insert  /  at 
the  end  of  the  list).  In  cither  case  we  reorder  the  list  by  rule  R.  The  cost  CQ(S)  of  the  Optimal  Static  Ordering 
is  fundamentally  different:  rather  than  starting  with  an  initially  empty  list,  each  search  uses  the  (unchanging) 
list  in  which  the  keys  are  arranged  in  decreasing  frequency  of  their  counts  in  S.  Note  drat  this  assumes  that  all 
keys  are  known  in  advance,  and  implies  that  each  search  will  be  successful  The  cost  of  finding  an  element  in 
position  i  is  i  comparisons. 

3.1.  Mov«-To ’Front  Heuristic 

In  this  subsection  we  wilt  show  that  for  any  particular  sequence  of  requests  S,  the  number  of  comparisons 
made  by  the  Move-To-Front  heuristic  is  never  more  than  twice  the  number  made  under  the  Optimal  Static 
Ordering.  To  do  this,  we  will  reduce  the  problem  to  the  case  in  which  the  request  list  contains  just  two  distinct 
keys,  analyze  that  simple  case,  and  finally  combine  several  facts  to  complete  the  proof. 

The  total  number  of  comparisons  made  for  a  given  request  sequence  can  be  divided  into  two  kinds  of 
comparisons:  intraword  comparisons  (successfully)  compare  equal  keys,  and  interword  comparisons 
(unsuccessfully)  compare  unequal  keys.  For  any  sequence,  the  number  of  intraword  comparisons  is  invariant 
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under  all  heuristics.  For  the  Move-To-Front  heuristic,  the  total  number  of  interword  comparisons  is  the  sum 
over  all  distinct  pairs  of  keys  of  the  number  of  interword  comparisons  made  between  cadi  pair.  Furthermore, 
for  any  sequence  S  and  all  pairs  of  keys  A  and  B,  the  number  of  interword  comparisons  of  A  to  B  is  exactly 
the  number  made  for  the  subsequence  of  S  consisting  solely  of  A’s  and  B’s.  We  call  this  the  pairwise 
independence  property  of  the  Move-To-Front  heuristic;  the  number  of  comparisons  made  is  dependent  only 
on  the  relative  ordering  of  the  A’s  and  B’s  in  die  sequence  and  is  independent  of  other  keys.  The  proof  of  the 
property  is  obvious:  accessing  an  A  will  cause  an  (A,B)  interword  comparison  if  B  is  in  front  of  A  in  the 
search  list,  which  is  true  if  and  only  if  the  last  B  was  accessed  more  recently  than  the  last  A.  The  other  keys  in 
the  request  sequence  do  not  affect  this  relationship. 

We  will  now  demonstrate  the  following  fact 
Fact  1. 

The  total  number  of  interword  comparisons  made  by  the  Move-To-Front  heuristic  on  a  sequence  of 

A’s  and  B’s  is  at  most  twice  the  number  of  interword  comparisons  made  by  the  Optimal  Static 

Ordering  applied  to  die  same  sequence. 

To  prove  the  fact  we  will  assume  that  the  sequence  S  consists  of  m  A’s  and  n  B’s,  where  (without  loss  of 
generality),  mS/i  Under  the  Optimal  Static  Ordering,  a  total  of  m  interword  comparisons  will  be  made 
(because  the  search  list  is  always  in  the  order  B  A,  and  so  only  requests  for  A  will  cause  an  interword 
comparison).  Under  the  Move-To-Front  rule,  an  interword  comparison  will  be  made  whenever  the  request 
sequence  changes  horn  an  A  to  a  B  or  from  a  B  to  an  A.  The  total  number  of  such  changes  possible  is  just 
twice  the  number  of  occurrences  of  A’s  (for  each  change  involves  an  A,  and  each  A  can  be  involved  in  at  most 
two  changes).  We  therefore  know  that  the  total  number  of  comparisons  made  by  Move-To-Front  is  at  most 
In.  Fact  1  follows  immediately. 

We  are  now  ready  to  prove  the  key  fret  of  this  subsection. 

Fact  2. 

For  any  sequence  5,  Cy(S)  &  ICgiS). 

We  will  prove  this  by  simple  algebra  on  the  relations 

CU(S)  =  Intra(S)  +  lntery(S )  and 
Cq(S)  *  Intna(S)  +  Inter0(S \ 

where  Intra  and  twerp  refer  to  (he  total  number  of  comparisons  of  each  type  made  by  rule  R.  By  Fact  1  we 
knew  that  each  pair  of  keys  satisfies  the  fector  of  two  inequality;  summing  over  that  inequality  (bran  distinct 
pain  give* 
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InlerM(S)  5  2  Inter0{S), 

for  any  sequence  S.  Combining  this  inequality  with  the  above  definition  gives  Fact  2. 

The  factor  of  two  in  Fact  2  cannot  be  tightened;  the  request  sequence 
A  B  C  D  (D  C  B  A)”1 
has  Co(S)~  2m  but  C^(S)~4m. 

Knuth  [1973,  Exercise  6.1-11]  shows  that  A\f(P)  <  2-A0(P)  for  any  distribution  P\  that  exercise  is  rated 
M30,  implying  that  it  is  mathematically  oriented  and  may  require  over  two  hours’  work  to  solve.  Fact  2  allows 
us  to  prove  that  result  easily:  we  let  the  sequence  S  be  an  arbitrarily  long  sequence  chosen  from  the 
distribution  P.  By  Fact  2,  we  know  that 

CM(S)  £  2  C0(S). 

Let  C{oj>(S)  denote  the  cost  of  applying  the  (asymptotically)  optimal  ordering  for  distribution  P  to  die 
sequence  5;  because  the  Optimal  Static  Order  for  die  sequence  is  optimal  over  all  static  orders,  we  know 

Co(S)  <  CAop(S). 

These  inequalities  combine  to  show  that 
Cj i/(5)  S  2  Caqp(S). 

The  law  of  large  numbers  establishes  that  for  an  arbitrarily  long  sequence  S, 

CAOp(S)/\S\  ~  A0(P), 

(exactly  as  in  the  previously  mentioned  analysis  of  the  Count  heuristic).  By  definition,  we  know  that 
CM(S)/  \S\  ~  AM(P). 

Combining  these  asymptotic  facts  with  the  third  inequality  yields  the  desired  result. 

3.2.  Count  Heuristic 

In  this  subsection  we  will  show  that  the  cost  of  the  Count  heuristic  on  any  particular  sequence  is  at  most 
twice  the  cost  of  the  Optimal  Static  Ordering.  Because  the  flow  of  this  subsection  is  exaedy  the  same  as  the 
previous  subsection,  we  will  proceed  at  a  faster  rate. 

The  first  fact  that  we  must  establish  is  that  the  Count  heuristic  has  the  pairwise  independence  property:  for 
any  sequence  5,  the  number  of  interword  comparisons  of  A  and  B  is  exaedy  the  number  made  for  the 
subsequence  of  S  consisting  solely  of  A’s  and  B’s,  This  is  easily  proved:  A  will  precede  B  in  the  search  list  if 
and  only  if  A  has  a  count  greater  than  B's  count,  or  in  the  case  of  equal  counts,  if  A’s  count  was  most  recently 
greater  than  B’s.  In  either  case,  the  positions  are  not  affected  by  other  keys.  As  in  the  Move-To-Front 
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heuristic,  this  pairwise  independence  allows  us  to  focus  on  two-element  sequences.  We  therefore  prove  the 
following  fact 

Fact  3. 

The  total  number  of  interword  comparisons  made  by  the  Count  heuristic  on  a  sequence  of  A’s  and  B’s 
is  at  most  twice  the  number  of  interword  comparisons  made  by  the  Optimal  Static  Ordering  applied  to 
the  same  sequence. 

To  prove  this  fact  we  again  assume  that  the  sequence  5  consists  solely  of  m  A’s  and  n  B's,  with  n.  Under 
the  Count  heuristic,  an  interword  comparison  is  made  every  time  the  second  key  in  the  search  list  is 
requested;  at  that  time,  its  count  field  is  incremented.  The  count  field  of  A  can  be  incremented  while  it  is  in 
the  rear  at  most  m  times  (because  it  is  requested  m  times).  Furthermore,  the  count  field  of  B  can  be 
incremented  while  it  is  in  the  rear  at  most  m  times  (because  after  that  it  has  been  requested  more  than  m  times 
and  can  no  longer  be  in  the  rear).  The  number  of  interword  comparisons,  then,  is  bounded  by  m  (requests  for 
A’s)  plus  m  (for  B’s),  or  2m,  which  is  twice  the  number  of  comparisons  made  by  the  Optimal  Static  Ordering. 

The  key  fact  of  this  subsection  follows  from  the  same  kind  of  reasoning  used  to  establish  Fact  2  of  the 
previous  subsection. 

Fact  4. 

For  any  sequence  S,  Cc(S)  5  2  C0(S). 

Again,  the  reasoning  involves  summing  over  the  factor  of  two  inequality.  By  an  example  similar  to  that  in  the 
previous  section,  the  factor  of  two  cannot  be  tightened. 

3.3.  Transpose  Heuristic 

In  this  subsection  we  will  demonstrate  that  the  worst-case  ratio  of  the  performance  of  Transpose  to  that  of 
the  Optimal  Static  Ordering  cannot  be  bounded  by  any  constant  This  is  easily  observed  if  we  consider  the 
request  sequence 

ABCDE(ED)* 

After  the  first  five  elements  are  stored  by  Transpose,  the  sequence  of  (E  D)  request  pairs  will  cause  those  two 
elements  to  swap  position  at  the  back  of  the  list,  and  neither  will  advance.  The  average  cost  of  a  search  in  this 
sequence  will  therefore  approach  S,  while  under  the  Optimal  Static  Ordering  1.5  comparisons  would  suffice. 
For  increasing  A,  this  example  gives 

CjiS)  >  3.33C0(5). 

The  constant  3.33  can  be  increased  to  ~  2A/3  by  increasing  the  length  of  the  "filler"  sequence  preceding  die 
"active"  pair  to  A-2.  Note  that  this  counterexample  exploits  the  fact  that  the  Transpose  heuristic  does  not 
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have  the  pairwise  independence  property:  the  relative  order  of  any  two  keys  depends  not  only  on  the  request 
sequence  but  also  on  whether  the  keys  are  adjacent  in  the  search  list 

4.  Empirical  Results 

The  theoretical  analyses  in  Sections  2  and  3  are  by  no  means  unanimous  in  their  evaluation  of  the 
heuristics.  To  gain  further  insight  into  the  behavior  of  the  heuristics,  we  used  each  to  perform  word  counts  on 
a  variety  of  files;  that  is,  the  words  in  each  file  served  as  a  request  sequence.  As  each  word  (defined  to  be  a 
lower-case  alphanumeric  string  delimited  by  spaces  or  punctuation  marks,  which  are  ignored)  was  requested, 
a  linear  search  of  the  key  list  was  performed,  the  count  field  for  the  key  was  incremented,  and  the  list  was 
reordered  according  to  the  appropriate  rule.  Each  trial  started  with  an  initially  empty  key  list;  at  the  first 
request  for  a  word,  the  list  was  searched  to  the  end  to  determine  its  absence  and  then  the  reordering  occurred 
as  if  the  element  had  been  found  in  the  (new)  last  position.  Although  this  application  clearly  suggests  the 
Count  heuristic  (since  the  frequencies  must  be  stored  anyway),  this  type  of  input  is  one  indicator  of  the 
behavior  of  the  heuristics  under  natural  conditions. 

The  average  search  cost  (defined  as  the  total  number  of  interword  comparisons  divided  by  the  number  of 
requests)  required  by  each  heuristic  for  each  file  is  reported  in  Table  1;  the  best  performance  for  each  file  is 
underlined.  Under  a  uniform  distribution  of  request  frequencies  the  average  static  search  cost  is 
approximately  D/2,  where  D  is  the  length  of  the  search  list.  We  might  expect  better  results  for  this  data, 
however,  because  tire  distribution  of  request  frequencies  in  many  natural  contexts  obeys  Zipfs  Law;  the 
average  cost  for  the  Optimal  Static  Ordering  of  that  distribution  is  approximately  D/  In  D  (see  Knuth  [1973, 
Section  6.1]).  The  column  in  Table  1  entitled  "Zipfs  Law”  gives  the  cost  of  the  Optimal  Static  Ordering  if  the 
requests  had  been  drawn  from  that  distribution;  comparing  that  column  to  the  cost  of  the  Optimal  Static 
Ordering  shows  that  the  data  is  closer  to  a  Zipf  distribution  than  a  uniform  distribution. 

The  files  were  obtained  from  user  accounts  and  an  on-line  documentation  system,  and  were  grouped  into 
two  classes:  Pascal  files  and  Text  files.  The  characteristics  of  the  classes  vary,  so  we  consider  the  results  for 
each  separately. 

The  four  Pascal  files  tested  contained  between  100  and  181  distinct  words  (corresponding  to  the  length  of 
the  key  list),  which  were  requested  a  total  of  431  to  14S6  times  (corresponding  to  the  length  of  the  request 
sequence).  The  empirical  results  for  the  Pascal  files  were  striking:  Move-to-Front  and  Count  performed 
dramatically  better  than  Transpose,  and  in  two  cases,  Move-to-Froit  required  fewer  comparisons  than  foe 
Optimal  Static  Ordering.  The  high  locality  present  in  source  code  accounts  for  this  surprising  phenomenon: 
infrequently  used  words  such  as  integer  appear  in  groups  rather  than  being  uniformly  distributed  throughout 
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the  file.  Where  a  request  for  such  a  word  would  require  a  long  search  under  the  Optimal  Static  Ordering,  the 
search  under  Move-to*Front  would  be  short  after  the  first  request,  since  the  key  would  then  be  at  die 
beginning  of  the  list  The  Count  heuristic  can  also  exploit  the  locality  of  keywords  such  as  real  and  integer;  at 
the  beginning  of  the  program  text,  their  counts  will  be  higher  than  the  counts  of  other  words.  For  Transpose, 
the  requested  element  may  not  have  time  to  drift  towards  the  front  and  high*locality  words  that  occur  in  the 
same  neighborhood  can  contend  with  one  another,  so  the  search  remains  expensive.  This  phenomenon  of 
locality  in  such  constructs  as  Total  :=  Total  +  1,  end;  end;  end,  and  var  declarations  enhances  the 
performance  of  Move-to-Front  considerably. 


The  Text  files  included  the  text  of  the  Constitution  of  the  United  States  (T6  in  Table  1),  the  script  to  The 
Rocky  Horror  Picture  Show  (T5),  a  version  of  this  paper  (T4),  excerpts  from  an  on-line  documentation 
system,  and  text  files  augmented  with  instructions  to  the  Scribe  document  production  system.  While 
Transpose  still  required  more  comparisons  than  the  other  two  heuristics,  its  performance  was  better  for  this 
class  of  files.  Move-to-Front  performed  best  in  most  cases,  although  it  never  beat  the  Optimal  Static 
Ordering.  The  Count  heuristic  was  never  far  behind  Move-to-Front,  and  in  two  cases  performed  better  than 
Move-to-Front 


Distinct 

Words 

Total 

Words 

Zipfs 

Law 

Optimal 

Static 

Ordering 

Move- to 

Front 

Count 

Transpose 

Pascal  Files 

PI 

100 

480 

18.28 

27.52 

2442 

33.16 

40.43 

P2 

107 

431 

19.36 

26.23 

25.62 

31.50 

38.92 

P3 

117 

1,176 

20.90 

18.04 

1821 

20.63 

30.53 

P4 

181 

1,456 

3032 

30.78 

31.40 

*  35.71 

47.41 

Text  Files 

T 1 

471 

1,888 

68.95 

93.03 

104.46 

11131 

147.41 

T2 

498 

1,515 

72.36 

112.86 

119.31 

135.63 

160.69 

T  3 

564 

3,296 

8038 

96.29 

98.90 

112.41 

155.17 

T  4 

999 

5,443 

132.48 

149.34 

168.79 

175.42 

258.20 

T  5 

1,147 

7,482 

149.47 

143.72 

17430 

166.10 

204.74 

T  6 

1,590 

7,654 

199.02 

23233 

280.83 

2f>7.64 

349.94 

Table  1.  Average  Search  Costs. 

The  empirical  results  indicate  that  neither  the  worst-case  nor  die  probabilistic  analyses  by  themselves 
completely  describe  the  behavior  of  die  heuristics  under  natural  conditions:  Transpose  clearly  is  not  the  best 
heuristic  for  this  application,  yet  it  never  performed  as  badly  as  our  results  showed  it  might  Certainly,  the 
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distribution  and  size  of  the  Fequest  sequence  seem  less  significant  than  thc.ordering  of  the  requests.  Empirical 
results  for  the  Pascal  files  would  be  less  dramatic  if  Pascal  reserved  words  were  treated  differently  from 
identifiers,  as  might  happen  within  a  compiler.  If  sequential  search  were  used  by  an  interpreter  for  identifier 
lookup  at  runtime,  however,  the  presence  of  dynamic  locality  (for  example,  in  requests  for  loop  variables) 
would  argue  strongly  for  the  use  of  Move-to-Front.4  This  phenomenon  is  not  restricted  to  source  code;  for 
example,  the  word  "president"  appears  with  high  locality  in  the  U.S.  Constitution.  In  most  written  prose, 
locality  of  subject  (and  therefore  of  certain  words)  determines  paragraph  construction.  Move-to-Front  is  able 
to  take  advantage  of  this  characteristic.  Indeed,  such  a  phenomenon  as  "pairwise-locality"  (the  appearance  of 
word  pairs  or  two-word  phrases  such  as  "vice-president"  or  A(IJ)  might  hamper  the  performance  of 
Transpose;  if  two  such  words  have  the  misfortune  to  be  adjacent  in  the  key  list,  then  they  will  contend  with 
each  other  rather  than  drifting  to  the  front  as  they  should. 

5.  Advice  to  Practitioners 

The  previous  sections  have  evaluated  the  heuristics  from  various  viewpoints,  using  theoretical  tools  as  well 
as  test  results  for  several  data  sets.  We  now  consider  the  heuristics  from  a  very  different  perspective:  how 
should  they  be  used  by  practicing  programmers? 

The  purpose  of  the  heuristics  is  to  increase  the  performance  of  a  linear  search.  This  raises  the  most 
important  point  of  this  section:  if  a  programmer  faces  an  efficiency  problem  in  a  search  procedure,  then  linear 
search  is  probably  not  the  method  of  choice.  Knuth  [1974,  Chapter  6J  describes  a  number  of  other  search 
methods  that  are  usually  significantly  more  efficient.  There  are,  however,  contexts  in  which  self-organizing 
linear  search  may  be  appropriate. 

•  When  N  is  very  small  (say,  at  most  several  dozen),  the  greater  constant  factors  in  die  runtimes  of 
other  strategies  may  make  linear  search  competitive.  This  occurs,  for  example,  when  linked  lists 
are  used  to  resolve  collisions  in  a  hashing  structure. 

•  When  space  is  severely  limited,  sophisticated  data  structures  may  be  too  space-expensive  to  use. 

•  If  the  performance  of  linear  search  is  almost  (but  not  quite)  good  enough,  a  self-organizing 
heuristic  may  make  it  effective  for  the  application  at  hand  without  adding  more  than  a  few  lines  of 
code. 


Srie  wtdely-u*ed  Microtoft  BASIC  interpreter  sore*  lymbob  in  tu  run-time  (linear)  lymbot  table  in  the  order  in  which  Stay  wm  flnt 
leen.  One  of  the  authore  (JLB)  once  reduced  the  run  time  of  a  production  BASIC  program  under  md)  an  Interpreter  from  fourteen 
boun  to  icven  hours  amply  by  referring  to  ench  "hot”  variable  ooce  In  a  dummy  ftatetnem  at  the  front  of  the  program.  Tbeuaaofdw 
Move-to-Front  heuristic  in  fudi  an  interpreter  would  probably  wtMtantially  docreaae  the  run  time  of  anany  BASIC  pragma. 
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Ed  McCreight  [1983]  found  himself  in  the  last  situation  when  improving  Ac  performance  of  a  VLSI  circuit 
simulator  that  had  two  primary  phases:  the  first  phase  read  the  description  of  the  circuit  and  the  second  phase 
then  simulated  the  circuit  On  typical  runs  the  first  phase  would  take  five  minutes  while  the  second  phase 
would  take  several  hours.  Although  the  five  minutes  of  the  first  phase  was  not  crucial,  it  was  irritating  for 
users  to  have  to  wait  that  long  to  see  the  simulation  begin  (especially  when  they  knew  that  most  of  the  time 
was  going  to  sequential  searches  in  Ac  simulator’s  symbol  table).  Following  McCreight’s  suggestion,  Ae 
implcmenter  of  Ae  program  augmented  Ae  straightforward  sequential  search  wiA  Ae  Move-to-Front 
heuristic.  Those  additional  half-dozen  lines  of  code  decreased  Ae  runtime  of  Ac  first  phase  from  five  minutes 
to  half  a  minute  (most  of  which  was  not  going  to  symbol  table  routines). 

The  lesson  to  be  learned  from  Ac  above  paragraph  is  Aat  when  efficiency  matters  in  a  search  routine,  Aen 
non-linear  data  structures  (especially  hashing)  should  be  seriously  considered.  Sometimes,  however,  self¬ 
organizing  heuristics  can  be  exactly  Ac  right  tool  for  Ae  job  by  providing  enough  runtime  efficiency  wiA 
little  overhead  in  code  development 

Knowing  when  to  use  self-organization  heuristics  still  leaves  Ae  implementer  wiA  Ae  decision  of  which 
one  to  choose  in  a  given  application.  Some  auAors  have  interpreted  Ae  results  in  Section  2  in  a  way  we  feel  is 
unwarranted:  for  instance,  Gotlieb  and  Gotlieb  [1978,  p.  118J  assert  in  Aeir  excellent  data  structures  text  Aat 
"[Movc-to-Front]  is  not  Ac  best  [strategy]  for  a  self-organizing  list.  It  is  better  to  promote  Ae  referenced  entry 
only  one  place  by  transposing  it  wiA  its  predecessor.”  The  following  discussion  of  Ae  heuristics  is  relevant  to 
most  situations  in  which  sclf-organi/ing  schemes  arc  applicable. 

•  Move-to-Front.  The  linked  list  implementation  of  Ais  heuristic  is  Ae  tneAod  of  choice  for  most 
applications.  The  heuristic  makes  few  comparisons,  boA  in  Ae  worst  case  and  when  observed  on 
real  data;  furthermore,  it  exploits  any  locality  of  reference  present  in  Ae  input  The  linked  list 
implementation  is  natural  for  an  environment  supporting  dynamic  storage  allocation  and  yields  an 
efficient  reorganization  strategy.  Unfortunately,  moving  to  front  is  expensive  if  Ae  sequence  is 
implemented  as  an  array. 

•  Transpose.  If  storage  is  extremely  limited  and  pointers  for  lists  cannot  be  used,  Aen  Ae  array 
implementation  of  Transpose  gives  very  efficient  reorganization.  Its  worst-case  number  of 
comparisons  is  high,  but  it  performs  well  on  Ae  average. 

•  Count.  AlAough  Ais  heuristic  does  make  a  small  number  of  comparisons  in  Ae  worst  case,  its 
extra  storage  and  higher  move  costs  make  it  unattractive  for  most  applications.  It  should  probably 
be  considered  only  in  applications  in  which  Ae  counts  are  already  needed  for  oAer  purposes. 

In  Ae  above  discussion  we  have  intentionally  kept  vague  several  potentially  quantifiable  measures.  Rather, 
we  appeal  to  an  intuition  that  asymptotically  efficient  algorithms  tend  to  require  more  code  and  to  have  larger 
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constant  factors.  We  have  avoided  hard  data  because  it  is  extremely  sensitive  in  this  context  to  coding  style 
and  to  details  of  the  compiler  and  machine  architecture.  Readers  who  insist  on  such  detail  should  consult 
Knuth  (1973,  Chapter  6],  but  we  warn  that  data  on  his  MIX  implementations  may  be  misleading  for  other 
computing  environments. 

6.  Conclusions 

The  conclusions  of  this  paper  are  clear:  when  a  self-organizing  sequential  search  is  appropriate  in  an 
application,  the  Count  and  (especially)  the  Move-To-Front  heuristics  should  be  considered  for 
implementation.  Although  previous  probabilistic  analyses  showed  that  Transpose  is  superior  to  Move-To- 
Front  under  some  measures,  both  our  worst-case  analyses  and  our  empirical  results  show  contexts  in  which 
the  opposite  is  true. 

The  theoretical  results  in  this  paper  could  be  extended  in  a  number  of  ways.  An  implementer  of  these 
algorithms  may  wish  to  consider  measurements  other  than  number  of  comparisons,  such  as  number  of  moves 
or  total  distance  moved.  The  worst-case  analysis  of  algorithms  previously  analyzed  only  for  their  expected 
performance  is  an  interesting  open  problem.  To  predict  more  accurately  the  behavior  of  the  heuristics  on 
input  like  that  described  in  the  previous  section,  it  would  be  helpful  to  have  theoretical  tools  for  describing 
the  locality  present  in  die  input  Also,  the  proof  techniques  that  we  presented  could  be  used  to  study  other 
self-modifying  structures  in  the  worst-case  sense.3 
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